AITopics | dropout training

f1de5100906f31712aaa5166689bfdf4-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 02:11:50 GMT

arxiv preprint arxiv, dropout, neural network, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

On Convergence and Generalization of Dropout Training

Neural Information Processing SystemsDec-24-2025, 21:13:37 GMT

We study dropout in two-layer neural networks with rectified linear unit (ReLU) activations. Under mild overparametrization and assuming that the limiting kernel can separate the data distribution with a positive margin, we show that the dropout training with logistic loss achieves $\epsilon$-suboptimality in the test error in $O(1/\epsilon)$ iterations.

convergence and generalization, dropout training, name change, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.45)

Add feedback

Dropout Training as Adaptive Regularization

Neural Information Processing SystemsSep-30-2025, 11:28:43 GMT

Dropout and other feature noising schemes control overfitting by artificially corrupting the training data. For generalized linear models, dropout performs a form of adaptive regularization. Using this viewpoint, we show that the dropout regularizer is first-order equivalent to an \LII regularizer applied after scaling the features by an estimate of the inverse diagonal Fisher information matrix. We also establish a connection to AdaGrad, an online learner, and find that a close relative of AdaGrad operates by repeatedly solving linear dropout-regularized problems. By casting dropout as regularization, we develop a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer.

adaptive regularization, dropout training, regularizer

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On Convergence and Generalization of Dropout Training

Neural Information Processing SystemsAug-17-2025, 06:11:54 GMT

We study dropout in two-layer neural networks with rectified linear unit (ReLU) activations.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

On Convergence and Generalization of Dropout Training

Neural Information Processing SystemsJan-15-2025, 11:41:47 GMT

We study dropout in two-layer neural networks with rectified linear unit (ReLU) activations. Under mild overparametrization and assuming that the limiting kernel can separate the data distribution with a positive margin, we show that the dropout training with logistic loss achieves \epsilon -suboptimality in the test error in O(1/\epsilon) iterations.

convergence and generalization, dropout training

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.38)

Add feedback

Dropout Training as Adaptive Regularization Stefan Wager, Sida Wang, and Percy Liang Departments of Statistics

Neural Information Processing SystemsMar-13-2024, 15:52:32 GMT

Dropout and other feature noising schemes control overfitting by artificially corrupting the training data. For generalized linear models, dropout performs a form of adaptive regularization.

logistic regression, regression, regularization, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
Europe > United Kingdom (0.04)

Genre: Research Report > New Finding (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.51)

Add feedback

Dropout Training as Adaptive Regularization

Neural Information Processing SystemsFeb-16-2024, 04:47:17 GMT

Dropout and other feature noising schemes control overfitting by artificially corrupting the training data. For generalized linear models, dropout performs a form of adaptive regularization. Using this viewpoint, we show that the dropout regularizer is first-order equivalent to an \LII regularizer applied after scaling the features by an estimate of the inverse diagonal Fisher information matrix. We also establish a connection to AdaGrad, an online learner, and find that a close relative of AdaGrad operates by repeatedly solving linear dropout-regularized problems. By casting dropout as regularization, we develop a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer.

adaptive regularization, dropout training, regularizer

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Information Geometry of Dropout Training

Kimura, Masanari, Hino, Hideitsu

arXiv.org Machine LearningJun-22-2022

Deep neural networks have been experimentally successful in a variety of fields (Deng and Yu, 2014; LeCun et al., 2015; Goodfellow et al., 2016). Dropout is one of the techniques that contribute to the performance improvement of neural networks (Srivastava et al., 2014). Many experimental results have reported the effectiveness of dropout, making it an important technique for training neural networks (Wu and Gu, 2015; Pham et al., 2014; Park and Kwak, 2016; Labach et al., 2019). Furthermore, the simplicity of the idea of dropout has led to the proposal of a great number of variants (Iosifidis et al., 2015; Moon et al., 2015; Gal et al., 2017; Zolna et al., 2017; Hou and Wang, 2019; Keshari et al., 2019; Ma et al., 2020). Understanding the behavior of such an important technique can be a way to know which of these variants to use, and in what cases dropout is effective in the first place.

artificial intelligence, dropout, machine learning, (13 more...)

arXiv.org Machine Learning

2206.10936

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

On Convergence and Generalization of Dropout Training

Mianjy, Poorya, Arora, Raman

arXiv.org Machine LearningOct-23-2020

We study dropout in two-layer neural networks with rectified linear unit (ReLU) activations. Under mild overparametrization and assuming that the limiting kernel can separate the data distribution with a positive margin, we show that dropout training with logistic loss achieves $\epsilon$-suboptimality in test error in $O(1/\epsilon)$ iterations.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

2010.12711

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

Machine Learning's Dropout Training is Distributionally Robust Optimal

Blanchet, Jose, Kang, Yang, Olea, Jose Luis Montiel, Nguyen, Viet Anh, Zhang, Xuhui

arXiv.org Machine LearningSep-13-2020

This paper shows that dropout training in Generalized Linear Models is the minimax solution of a two-player, zero-sum game where an adversarial nature corrupts a statistician's covariates using a multiplicative nonparametric errors-in-variables model. In this game---known as a Distributionally Robust Optimization problem---nature's least favorable distribution is dropout noise, where nature independently deletes entries of the covariate vector with some fixed probability $\delta$. Our decision-theoretic analysis shows that dropout training---the statistician's minimax strategy in the game---indeed provides out-of-sample expected loss guarantees for distributions that arise from multiplicative perturbations of in-sample data. This paper also provides a novel, parallelizable, Unbiased Multi-Level Monte Carlo algorithm to speed-up the implementation of dropout training. Our algorithm has a much smaller computational cost compared to the naive implementation of dropout, provided the number of data points is much smaller than the dimension of the covariate vector.

artificial intelligence, dropout training, machine learning, (16 more...)

arXiv.org Machine Learning

2009.06111

Country: